/*----------------------------------------------------------------------*/
/* PROGRAM: master.do							*/
/*									*/
/*----------------------------------------------------------------------*/

*************************************************
* SET UP 					*
*************************************************

* Set the Stata version that works for these codes 
version 14.1

* Set up the main Stata settings 
clear all
clear mata
set matsize 11000
set maxvar 120000, perm
set type double
set more off, perm
ssc install blindschemes, replace all
ssc install confirmdir, replace all
ssc install distinct, replace all
ssc install mtebinary, replace all
set scheme plotplainblind, perm


*************************************************
* SET UP DIRECTORIES 				*
*************************************************

* Modify the directory path and name below to the parent directory of the project 
* that you create.
if "$ohie" == ""{
	global ohie ///
	"`c(pwd)'" //defines $ohie as the current folder if it's undefined.
}

cd $ohie
* The following lines create subdirectories within the parent directory to organize
* project files.

*Defines dofiles_and_rawdata directory the same as $ohie. Users can make this its own 
*subdirectory by altering the file path (to "$ohie/dofiles_and_rawdata" for example)
confirmdir "$ohie" 
if `r(confirmdir)'!=0 {
	mkdir "$ohie"	
}
global dofiles "$ohie"

* Subdirectory for storing intermediate analytic datasets 
confirmdir "$ohie/data" //confirms whether subdirectory exists
if `r(confirmdir)'!=0 {
	mkdir "$ohie/data" //creates new directory if it does not exist
	}
global final "$ohie/data"
global final_analytic "$ohie/data"

* Subdirectory for storing logs
confirmdir "$ohie/logs"
if `r(confirmdir)'!=0 {
	mkdir "$ohie/logs"
	}
global log "$ohie/logs"

* Subdirectory for storing outputs
confirmdir "$ohie/output"
if `r(confirmdir)'!=0 {
	mkdir "$ohie/output"
	}
global output "$ohie/output"



*************************************************
* OTHER SETUP					*
*************************************************

* Set up log
*cap log close
*global sysdate: disp %tdYYNNDD  date("`c(current_date)'", "DMY")
*qui log using 	"$log/master_$sysdate.log", replace

* Set up seeds
* These seeds stay the same across all codes. Note that we have a separate 
* seed for each outcome. The reason for having three separate seeds is that 
* we can consistently use the same bootstrap samples across codes for each 
* outcome, and therefore recover the exact same confidence intervals on an
* estimate that is computed in two different codes. Having three separate 
* seeds also allows us to loop through the three outcomes in any order 
* and still recover the same confidence intervals regardless of order.
global Y_charges_seed = 6574356
global Y_any_seed = 	6574357
global Y_num_seed = 	6574358

* Set flags to switch on modules of the program
local ohie_data_setup 1
local ohie_replication 0
local ohie_sumstats 1 
local preER_adv_seln 1 
local diff_in_diff 0
local linmte_no_covars_graph_data 1 
local linmte_no_covars_treat_eff 1 
local linmte_no_covars_subgroup 0
local global_polynomial_all_specs 1 
local characteristics_by_ate 0
local montecarlo 0
local natexp 0
local brfss_data_setup 1 
local brfss_diff_diff 1
local extrapolation_late 1 
local extrapolation_slate 1
local late_reweighting 1
local linmte_eligibility_graph_data 0
local propensity_score_graphs 0
local brfss_sumstats 0
local ohie_rep_additional 0
local linmte_no_covars_subgroup_add 0
local linmte_covars_treat_eff 0
local complier_neg_TEs 0
local ks_LATE_test 0


*************************************************
* RUN THE MAIN FILES			*
*************************************************

/* Load the ohie_data_setup.do file */

* This code creates two analytic data sets for the OHIE data, one for the 1 
* lottery entrant sample, and another for the full data. The majority of the 
* subsequent do-files use these analytic data sets as the input data set, 
* particularly the analytic data set for the OHIE 1 lottery entrant sample, 
* which is our baseline working sample for this project.		

* OUTPUT:											
* [*]	intermediate/oregonnumhh1.dta: This is the analytic data set for 
* 	the OHIE 1 lottery entrant sample and will be used throughout most .do 
*	files as an input data set.	
* [*]	intermediate/oregonpooled.dta: This is the analytic data set for 
* 	the full OHIE sample.																					

if `ohie_data_setup' do "$dofiles/ohie_data_setup.do"

/* Load the ohie_sumstats.do file */

* This code runs and bootstraps the internal and external validity tests, as	
* well as the difference between compliers for the OHIE 1 lottery entrants
* sample. The external validity test is also called the difference-in-difference
* test. The results produced here are reported in the top panel of Table									
* "ohie_brfss_sumstats_new" in the paper.		

* OUTPUT:												
* [*]	diff_in_diff_ohie.xls: This output file contains the estimate and
*	bootstrapped standard errors reported in the top panel of Table
*	"ohie_brfss_sumstats_new" of the paper.								

if `ohie_sumstats' do "$dofiles/ohie_sumstats.do"	
	

/* Load the linmte_no_covars_graph_data.do file */

* This code outputs the data necessary for graphing the MTE bounds and the linear
* MTE without covariates (also called MTE(p)). The data is used in the following 
* exhibits of the paper (this code does not create the exhibits. It just creates
* the data needed to support the exhibits):
* 	- averages_Y_num_eqs	
* 	- identified_outcomes													
															
* OUTPUT:												
* [*]	linmte_no_covars_graph_data.xls: This output file contains the 
* 	he data necessary for replicating the above-mentioned exhibits of the 
*	paper.											
	
if `linmte_no_covars_graph_data' do "$dofiles/linmte_no_covars_graph_data.do"	


/* Load the preER_adv_seln.do file */

* This code calculates the mean of previous ER buckets in the OHIE sample across
* the three compliance types. It outputs data necessary for graphing the 
* "preER_adv_seln" figure in the paper.

* OUTPUT:												
* [*]	preER_adv_seln.xls: This output file contains the 
* 	data necessary for replicating the "preER_adv_seln" figure of the paper.

if `preER_adv_seln' do "$dofiles/preER_adv_seln.do"	

											
	
/* Load the linmte_no_covars_treat_eff.do file */

* This code calculates the treated outcomes, untreated outcomes, and treatment
* effects associated with the linear MTE without covariates [MTE(p)] and 
* bootstraps these values in order to obtain the bootstrapped confidence 
* intervals and significance stars. The output data is used in the following 
* exhibits of the paper (this code does not create the exhibits. It just creates 
* the data needed to support the exhibits): 			
* 		- sumstats_Y_num_UOT					
* 		- mte_linear_Y_num													

* OUTPUT:													
* [*] 	linmte_no_covars_treat_eff: This output file contains
*	all data necessary for replicating the above-mentioned exhibits.				

if `linmte_no_covars_treat_eff' do "$dofiles/linmte_no_covars_treat_eff.do"
	
	
/* Load the global_polynomial_all_specs.do file */	

* This code outputs the data necessary to create the figures in the paper that 
* show the MTE with covariates:		
*	- linmte_ptiles_Y_num_preutilization_line		
* 	- smte_Y_num_binned		
	
* OUTPUT:												
* [*]	global_polynomial_all_specs: 	This output file contains the 
* data necessary for replicating the above-mentioned figures in the paper.									
			
if `global_polynomial_all_specs' do "$dofiles/global_polynomial_all_specs.do"


	
/* Load the brfss_data_setup.do file */

* This code creates the analytic data set for the BRFSS data to use for 
* extrapolation. We only use the Massachusetts data for extrapolation. 							

* OUTPUT:
* [*]	$final/brfss.dta: This is the analytic data set for the BRFSS 
*	data and will be used as input data set for the extrapolation exercises.			
			
if `brfss_data_setup' do "$dofiles/brfss_data_setup.do" 



/* Load the brfss_diff_diff.do file */

* This code computes the summary statistics, specifically the averages and
* sample sizes, for certain available characteristics of the BRFSS 
* Massachussetts data. Unlike the OHIE summary statistics, where we compute
* averages for the outcomes and the predicted outcomes, we compute only
* averages for certain characteristics. We report these averages in the bottom
* panel of the "ohie_brfss_sumstats_new" table. The OHIE summary statistics 
* reported in the top panel of this table are output by the "ohie_sumstats.do" file	

* OUTPUT:								
* [*]	brfss_sumstats: This output file contains the averages
*	and sample counts reported in the bottom panel of the 
*	"ohie_brfss_sumstats_new" table.					

if `brfss_diff_diff' do "$dofiles/brfss_diff_diff.do"



/* Load the extrapolation_late.do file */

* This code uses the MTO(p), MUO(p), MTE(p) estimated in the OHIE 1 lottery
* entrant sample for each respective bootstrap iteration to estimate the LATE
* in the BRFSS data using the pB, pI, and s(pI) estimated in the BRFSS data
* using frequency weights. The pB, pI, and s(pI) are also bootstrapped and
* therefore it is of utmost importance to extract the MTO(p), MUO(p), and MTE(p)	
* information from the OHIE for the appropriate bootstrap replication. The 
* output from this code is used plotting "MA LATE" on the MTE(p) line of the 
* "extrapolation_or_ma_Y_num" figure in the paper.		
															
* OUTPUT:														
* [*]	extrapolation_late: This data set contains the OHIE LATE for
*	the sample of 1 lottery entrant, the BRFSS LATE, and the difference
* 	between the two.				

if `extrapolation_late' do "$dofiles/extrapolation_late.do"



/* Load the extrapolation_slate.do file */

* This code uses the linear MTO(p), MUO(p), MTE(p) estimated using
* the common covariates in the OHIE 1 lottery entrant sample for	
* estimating the SLATE in the following  scenarios:		
*	- Using Oregon Xs and Oregon ps 				
*	- Using Massachusetts Xs and Massachusetts ps			
* The code only works for the linear MTE with covariates, where	
* the covariates are the common covariates (age, gender, English,	
* and all two-way interactions between those). Using Xs for	
* a particular state means we use the individual level data from	
* that data set, and using the ps from a particular data set means
* that we use the propensity score coefficients we obtain from	
* that data. This code outputs data that is used to  plot		
* the E[MTE(p,X_MA)] line in Figure "extrapolation_or_ma_Y_num" 	
* of the paper.							
*									
* OUTPUT:								
* [*]	extrapolation_slate: This data set contains data 	
*	for above-mentioned figure.					
		

if `extrapolation_slate' do "$dofiles/extrapolation_slate.do"



/* Load the late_reweighting.do file */

* This code calculates LATE in the Oregon sample reweighted to reflect the 
* distribution of covariates in Massachusetts. The output from this file is not
* used in any exhibit of the paper. However, it is used for the following in-text
* statistic mentioned under the subsection "LATE-Reweighting with Common 
* Observables Cannot Reconcile LATEs": "This approach yields an increase of 0.23 visits
* among Massachusetts compliers, which is positive and therefore cannot 
* reconcile the results."
* OUTPUT:								
* [*]	late_reweighting.xls: Contains the re-weighted LATE. This code also calculates
*	the local called `bLATE2' whose value is used in the sentence: "This 
*	approach yields an increase of 0.23 visits among Massachusetts compliers, 
*	which is positive and therefore cannot reconcile the results."

if `late_reweighting' do "$dofiles/late_reweighting.do"


******************************END***********************************************

qui log close
